[BEAM-3074] Stage the portable pipeline; put URL in pipeline options#4090
Conversation
|
R: @aaltay Same as last time; new place to put the info. The way I inspect pipeline options in the tests seems unpleasant. |
|
R: @robertwb wdyt? |
robertwb
left a comment
There was a problem hiding this comment.
I agree these tests are kind of icky, but it seems in line with the rest of the tests here. Some questions about the deleted lines though.
| kind='local' if self.local else 'harness', | ||
| packages=package_descriptors, | ||
| # https://issues.apache.org/jira/browse/BEAM-3116 | ||
| # metadata=dataflow.WorkerPool.MetadataValue(), |
There was a problem hiding this comment.
Should this line be uncommented rather than deleted?
| servicePath=self.google_cloud_options.dataflow_endpoint))) | ||
|
|
||
| # https://issues.apache.org/jira/browse/BEAM-3116 | ||
| # pool.metadata.additionalProperties.append( |
| if additionalProperty.key == 'options': | ||
| recovered_options = additionalProperty.value | ||
| break | ||
| self.assertIsNotNone(pipeline_options, |
There was a problem hiding this comment.
for loops can have an "else" clause that's triggered if no break was encountered.
| if additionalProperty.key == 'options': | ||
| recovered_options = additionalProperty.value | ||
| break | ||
| self.assertIsNotNone(pipeline_options, |
There was a problem hiding this comment.
robertwb wrote:
for loops can have an "else" clause that's triggered if no break was encountered.
Nice. Done.
| kind='local' if self.local else 'harness', | ||
| packages=package_descriptors, | ||
| # https://issues.apache.org/jira/browse/BEAM-3116 | ||
| # metadata=dataflow.WorkerPool.MetadataValue(), |
There was a problem hiding this comment.
robertwb wrote:
Should this line be uncommented rather than deleted?
Actually they can't work right now. The issue is an internal Dataflow bug that manifests as BEAM-3116. TL;DR: we cannot use worker pool metadata at all.
Template jobs are encoded in proto3 JSON format, more or less, and when they are run they are decoded via proto2 JSON format. This bug went undetected until now because there were no nonempty maps. Longer term, templates should use a proto binary encoding and perhaps be created service-side.
For now, putting this in pipeline options instead, as a generic "bag of values".
There was a problem hiding this comment.
Shouldn't we leave these comments here as a reminder that we need to fix the bug and pass the pool.metadata (here or elsewhere)? Or is this just some cleanup (that's irrelevant to the other changes in this PR?)
There was a problem hiding this comment.
I have an internal bug filed, and discussion on going, on fixing the pool metadata situation as well as designing the very best place for this sort of thing. IMO for the purposes of this PR both pool metadata and pipeline options are both sort of fine places to put this data, but neither are obviously the Right Thing To Do. So I really do mean to fully abandon ever trying to put this data in the pool metadata.
There was a problem hiding this comment.
Thanks for the context. LGTM.
da2a280 to
6e682c6
Compare
|
It looks like the usual failure in |
|
run python precommit |
84691da to
101339d
Compare
101339d to
97bfdcb
Compare
|
PTAL - fixed up lint issues. |
Follow this checklist to help us incorporate your contribution quickly and easily:
[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue.mvn clean verifyto make sure basic checks pass. A more thorough check will be performed on your pull request automatically.